class: center, middle, inverse, title-slide # Visualizations in R ### Karel Kroeze ### 9/23/2021 --- # Contents * What is a visualization? * Basics of ggplot2 * <span style="color: green;">Good</span> plot, <span style="color: red;">Bad</span> plot * Simplifying * Layering * Going extra-dimensional --- # What is the goal of visualization? -- .pull-left[ ## Identify patterns and exceptions > The greatest value of a picture is when it forces us to notice what we never expected to see Tukey, _Exploratory Data Analysis_ - explore what is possible - exploratory as in _finding_, not _aimless wandering_. - identify patterns and exceptions - present not what we already knew, but what is unexpected - clearly and obviously present new information ] -- .pull-right[ ## Aid analytical thinking > The principles of evidence display are derived from the universal principles of analytical thinking -- and not from local customs, intellectual fashions, consumer convenience, marketing, or whatever the technologies of display happen to make available. Tufte, _Beautiful Evidence_ - understanding causality - making multivariate comparisons - examine relevant evidence - assess credibility of evidence and conclusions ] --- # Let's play a game! ## Good graph, bad graph? --- # Good graph, bad graph? <img src="data:image/png;base64,#images/bad-pie-chart.png" style="max-height: 500px;" /> --- # Good graph, bad graph? <img src="data:image/png;base64,#images/bad-lines.png" style="max-height: 500px;" /> --- # How to build a graphic * Grammar of Graphics [(Wilkinson, 2005)](https://link.springer.com/book/10.1007%2F0-387-28695-0) * Layered Grammar of Graphics [(Wickham, 2010)](https://doi.org/10.1198/jcgs.2009.07098) ### A grapic is composed of... -- * Data * Variables * Scales * Statistics * Geometry * Aesthetics * Facets * Guides --- # Decomposition of a graph  --- # Decomposition of a graph .pull-left[] .pull-right[] --- # `ggplot2` ```r library(tidyverse) library(ggplot2) train <- readr::read_csv("data/train.csv") ``` --- ### start with an empty canvas ```r ggplot(train) ``` <!-- --> --- ### add an axis ```r ggplot(train, aes(x = SalePrice)) ``` <!-- --> --- ### add some geometry ```r ggplot(train, aes(x = SalePrice)) + geom_histogram() ``` <!-- --> --- ### add some geometry ```r ggplot(train, aes(x = SalePrice)) + geom_histogram() ``` .pull-left[ <!-- --> ] .pull-right[ ## This created * a **statistic** (_count_) * a **statistic** (_bin_) * an **axis** (_y_) * some **geometry** (_bars_) ] --- ### add some labels ```r ggplot(train, aes(x = SalePrice)) + geom_histogram() + scale_x_continuous(name = "Sales price", labels = scales::dollar) + labs(title = "Distribution of Sales price") ``` <!-- --> --- ### use a different geometry ```r ggplot(train, aes(x = SalePrice)) + geom_density() + scale_x_continuous(name = "Sales price", labels = scales::dollar) + labs(title = "Distribution of Sales price") ``` <!-- --> --- ### use a different geometry .pull-left[ <!-- --><!-- --> ] .pull-right[ ### This created... * a **statistic** (_density_) * an **axis** (_y_) * some **geometry** (_line_) ### Note that... * _density_ and _count_ do not have the same range! * `ggplot` won't let you make two scales on the same axis _<small>because that is **bad** design</small>_ ] --- ### combining layers ```r ggplot(train, aes(x = SalePrice)) + geom_density() + geom_histogram(aes(y=..density..), alpha=.2) + scale_x_continuous(name = "Sales price", labels = scales::dollar) + labs(title = "Distribution of Sales price") ``` <!-- --> --- # Good graph, bad graph? <img src="data:image/png;base64,#images/bad-scales.png" style="max-height: 500px;" /> --- # Good graph, bad graph? <img src="data:image/png;base64,#images/bad-infographic.png" style="max-height: 500px;" /> --- ### multivariate plots ```r ggplot(train, aes(x = LotArea, y = SalePrice)) + geom_point() + scale_y_continuous(name = "Sales price", labels = scales::dollar) + labs(title = "Sales Price by Lot Area", x = "Lot area") ``` <!-- --> --- ### Let's make this more readable... <!-- note that these code blocks are copied here for slide beautifaction and layout only --> ```r ggplot(train, aes(x = LotArea, y = SalePrice)) + geom_point() + scale_y_continuous(name = "Sales price", labels = scales::dollar) + labs(title = "Sales Price by Lot Area", x = "Lot area") ``` .pull-left[ <!-- --> ] .pull-right[ ] --- ### Let's make this more readable... <!-- note that these code blocks are copied here for slide beautifaction and layout only --> ```r ggplot(train, aes(x = LotArea, y = SalePrice)) + geom_point() + scale_y_continuous(name = "ln(Sales price)", labels = scales::dollar, trans = "log") + scale_x_continuous(name = "ln(Lot area)", trans = "log") + labs(title = "Sales Price by Lot Area, log transformed") ``` .pull-left[ <!-- --> ] .pull-right[ * apply `log` transformation ] --- ### Let's make this more readable... <!-- note that these code blocks are copied here for slide beautifaction and layout only --> ```r ggplot(train, aes(x = LotArea, y = SalePrice)) + geom_point(alpha=.3) + scale_y_continuous(name = "ln(Sales price)", labels = scales::dollar, trans = "log") + scale_x_continuous(name = "ln(Lot area)", trans = "log") + labs(title = "Sales Price by Lot Area, log transformed") ``` .pull-left[ <!-- --> ] .pull-right[ * apply `log` transformation * make points transparent ] --- ### Add more information <!-- note that these code blocks are copied here for slide beautifaction and layout only --> ```r ggplot(train, aes(x = LotArea, y = SalePrice)) + ... + geom_smooth(method="lm") + ... ``` .pull-left[ <!-- --> ] .pull-right[ * add smoothing line (linear fit) ] --- ### Add more information <!-- note that these code blocks are copied here for slide beautifaction and layout only --> ```r ggplot(train, aes(..., colour=BldgType)) + ... ``` .pull-left[ <!-- --> ] .pull-right[ * add smoothing line (linear fit) * split and colour by type of home ] --- <!-- note that these code blocks are copied here for slide beautifaction and layout only --> ```r ggplot(train, aes(x = LotArea, y = SalePrice)) + geom_point(alpha=.3) + geom_smooth(method="lm") + scale_y_continuous(name = "ln(Sales price)", labels = scales::dollar, trans = "log") + scale_x_continuous(name = "ln(Lot area)", trans = "log") + labs(title = "Sales Price by Lot Area, log transformed") ``` .pull-left[ <!-- --> ] .pull-right[ * add smoothing line (linear fit) * split and colour by type of home * facet by neighbourhood _<small>have we gone too far yet?</small>_ * 2d graphs are inherently limited _<small>part of the art is playing with dimensions</small>_ ] --- # Good graph, bad graph? <img src="data:image/png;base64,#images/bad-dual-scales.png" style="max-height: 500px;" /> --- # Good graph, bad graph? <img src="data:image/png;base64,#images/bad-palette.png" style="max-height: 500px;" /> --- # Lets take a step back ### What is the difference between... .pull-left[ ### a **bar** chart <!-- --> ] .pull-right[ ### a **pie** chart <!-- --> ] -- ### **Cartesian** vs. **Polar** coordinates --- # Recap - `ggplot2` * Good graphics follow certain rules * **Grammar of Graphics**... * makes it easy to create convincing graphics * makes it hard to deviate from those rules * you should be using ggplot because... * makes layering data, aesthetics, geometries, etc. easy * huge user base - easy to get help * lots of extensions # Further reading * [Cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf) * [Cookbook](https://r-graphics.org/) * [Quick start](https://r4ds.had.co.nz/data-visualisation.html) * [Lots of extensions](https://exts.ggplot2.tidyverse.org/gallery/) * [Deep dive](https://ggplot2-book.org/) --- # Limitations - `ggplot2` * it's **opinionated**, doing things your own way is hard _<small>but your way is probably **wrong**</small>_ * it's limited to _static_ images _<small>but other packages have added animation (`gganimate`) or limited interaction (`plotly`).</small>_ --- # Let's get animated - gganimate .pull-left[ ```r library(tidyverse) library(gganimate) library(gifski) train <- readr::read_csv("../data/train.csv") # prepare some data animate <- train %>% mutate(DateSold = lubridate::ym(glue::glue("{YrSold}-{MoSold}"))) %>% group_by(DateSold, BldgType, Neighborhood) %>% summarize(across(c(LotArea, SalePrice), median), Count = n()) %>% filter(LotArea <= 50000) # there's a single row with > 150k area ``` ] .pull-right[ ```r ggplot(animate, aes( x = LotArea, y = SalePrice, size = Count, colour = Neighborhood )) + geom_point(alpha = 0.7, show.legend = FALSE) + scale_size(range = c(2, 12)) + facet_wrap(. ~ BldgType) + # animation addons labs(title = 'Date sold: {frame_time}', x = 'Median lot size', y = 'Median sale price') + transition_time(DateSold) + exit_fade() + shadow_wake(.1) + ease_aes('linear') ``` ] --- # Let's get animated - gganimate  --- # Let's get interactive - plotly ```r library(plotly) plot <- ggplot(train, aes(x=log(LotArea), y=SalePrice)) + geom_smooth(method="lm") + geom_point(alpha=.2) ggplotly(plot, width=600, height=300) ``` --- # Let's get interactive - plotly
--- # Taking it a step further - shiny * runs R as a webserver * UI rendered on browser * outputs rendered on server * reactive communication --- # Taking it a step further - shiny ### UI function ```r library(shiny) library(tidyverse) library(ggplot2) train <- readr::read_csv("train.csv") ui <- fluidPage( fluidPage( sidebarLayout( sidebarPanel( selectInput("x", "X-Axis", names(train)), selectInput("y", "Y-Axis", names(train)), selectInput("geom", "Geom", c("line", "point", "boxplot")) ), mainPanel( plotOutput("plot") ) ) ) ) ``` --- # Taking it a step further - shiny ### Server function ```r server <- function(input, output, session) { output$plot <- renderPlot({ cat(input$x) cat(input$y) base <- ggplot(train, aes_string(x = input$x, y = input$y)) if(input$geom == "line"){ plot <- base + geom_line() } else if (input$geom == "boxplot") { plot <- base + geom_boxplot() } else { plot <- base + geom_point() } return(plot) }) } ``` --- # Taking it a step further - shiny ### as a website/widget https://karel-kroeze.shinyapps.io/example-shiny-app-13d113csss/ <iframe src="https://karel-kroeze.shinyapps.io/example-shiny-app-13d113csss/" width="800" height="500" /> --- # Taking it too far? - shiny + esquisse https://bdsi.shinyapps.io/esquisse-test/ | https://github.com/dreamRs/esquisse <iframe src="https://bdsi.shinyapps.io/esquisse-test/" width="800" height="400" /> --- # For inspiration; * [gallery of (award winning) apps](https://shiny.rstudio.com/gallery/) * some things we're working on: * [visualization database](https://karel-kroeze.shinyapps.io/henk-vis-browser/) * [embeddable mini-apps for education](https://karel-kroeze.shinyapps.io/mini-apps-sampling-distribution/) <iframe src="https://karel-kroeze.shinyapps.io/mini-apps-sampling-distribution/" width="600" height="300" /> --- ## Thank you for listening! # Questions?